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Abstract — In this paper, adaptive estimation based on noisy 
quantized observations is studied. A low complexity adaptive 
algorithm using a quantizer with adjustable input gain and offset 
is presented. Three possible scalar models for the parameter 
to be estimated are considered: constant, Wiener process and 
Wiener process with deterministic drift. After showing that the 
algorithm is asymptotically unbiased for estimating a constant, 
it is shown, in the three cases, that the asymptotic mean squared 
error depends on the Fisher information for the quantized 
measurements. It is also shown that the loss of performance due 
to quantization depends approximately on the ratio of the Fisher 
information for quantized and continuous measurements. At the 
end of the paper the theoretical results are validated through 
simulation under two different classes of noise, generalized 
Gaussian noise and Student 's-t noise. 

Index Terms — Parameter estimation, adaptive estimation, 
quantization. 

I. Introduction 

CONTINUOUS advances in the development of cheaper 
and smaller sensors and communication devices moti- 
vated the introduction of sensor networks in many different 
domains, e.g. military applications, infrastructure security, 
environment monitoring, industrial applications and traffic 
monitoring |1|. When designing a sensing system, one must 
account not only for the physical perturbations that can affect 
sensing performance, more specifically noise, but also for the 
inherent design constraints such as bandwidth and complexity 
limitations. Commonly, the effect of the noise in system per- 
formance is taken into account, but bandwidth and complexity 
constraints are neglected. 

One simple way to respect bandwidth constraints is to 
compress sensor information using quantizers. The theory of 
quantizer design for reducing distortion in the measurement 
representation is well established in the literature [2], however 
much less results can be found when the quantities to be recon- 
structed are not directly the measurements but an underlying 
parameter embedded in noise. 

In pj, noisy samples of a constant are taken using a uniform 
quantizer with an input offset, the output samples of the 
quantizer are used to estimate the constant. Using this type of 
measurement system, results for different types of offset were 
obtained. The types of offset considered were known constant 
and variable offset, random offset and offset based on feedback 
of the output measurements. The comparison was performed 
based on the Cramer-Rao bound (CRB) ratio which is the 
worst case ratio between the CRB for quantized measurements 
and continuous measurements. It was shown that the last type 
of offset, based on feedback, was the most efficient one. 
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Another interesting result from p) is that in the Gaussian 
noise case with one bit quantized measurements, the minimum 
CRB ratio that can be attained is ?. This result was used 
as a motivation for pj to study more in detail estimation 
under Gaussian noise and binary quantization. In |4|, it was 
shown that the CRB for a fixed known threshold can be upper 
bounded by the exponential of the squared difference between 
the threshold and the constant to be estimated. This means that 
the closer the threshold is to the parameter to be estimated 
with binary measurements, the lower can be the estimation 
variance. It was also pointed out that an iterative algorithm 
could be used to adjust the threshold exactly to be the last 
estimate of the parameter. 

An adaptive algorithm for placing the threshold was detailed 
in j5), where a sensor network extension was also proposed. 
At each time step, a sensor measures one bit, updates its 
threshold using a simple cumulative sum and broadcasts the 
new threshold to the other sensors and to a fusion center. Thus, 
the thresholds are placed around the parameter in an adaptive 
way and at the fusion center the broadcasted bits are used 
to obtain a more precise estimate of the parameter. Two other 
methods for updating the thresholds were presented in |6|, one 
method used a more refined cumulative sum based on the last 
two measured bits, the other proposed method was to estimate 
the parameter using a maximum likelihood method and then 
set the threshold at the estimate of the parameter. It was shown 
that in the asymptotic case (large number of iterates) the 
CRB for the fusion center estimate using maximum likelihood 
threshold updates converges to the minimum possible CRB, 
which is the CRB when the threshold is placed exactly at the 
parameter. 

In the same line of the work mentioned above, algorithms 
for estimating a scalar parameter from multiple bit quantized 
noisy measurements are proposed. The algorithms developed 
in this work are based on low complexity adaptive techniques 
that can be easily implemented in practice. The mean and 
mean squared error (MSE) are obtained for a general class of 
symmetrically distributed noise and three types of parameter 
evolution: constant, Wiener process and Wiener process with 
drift. As in related work [3 |, the loss of estimation performance 
due to quantization is also evaluated and the validity of the 
performance results is verified through simulation. 

The main contributions of this work are 

• Design and analysis of adaptive estimation algorithms 
based on multiple bit quantized noisy measurements. 
Differently of (5] and (6), where only binary quantization 
is treated. 
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• Explicit performance analysis for tracking of a varying 
parameter. In PJ-H the parameter is set to be constant 
and all subsequent analysis is based on this hypothesis. 

• Low complexity algorithms. The algorithms proposed 
here are based on simple recursive techniques that have 
lower complexity than the maximum likelihood methods 
used in [5 1 and |6|. 

The paper is structured in the following form: in section II 
the problem is stated and the main assumptions are made, in 
section III the general adaptive algorithm and results from 
adaptive algorithms theory are presented, then in section 
IV the parameters of the adaptive algorithm are obtained. 
Section V contains theoretical performance results and also the 
simulation of the algorithm. Section VI concludes the paper. 

II. Problem statement 

Let X be a stochastic process defined on the probability 
space V = (O, T, P) with values on (M, B (M)), at each instant 
k E N*, the corresponding scalar random variable (r.v.) Xk 
will be given by the following model: 



A3. The PDF / (x) is an even function and it strictly de- 
creases w.r.t. |x|. 

The first assumption is required by the method of analysis 
that will be used to assess the performance of the pro- 
posed algorithms. Most noise CDFs considered in practice are 
Lipschitz continuous, thus the first assumption is generally 
satisfied. Assumption 2 is a commonly used assumption that 
in practice will be used when the derivative of F w.r.t. its 
arguments is needed. Assumption 3 will be used to prove 
the asymptotic convergence of the algorithms and it is also 
commonly satisfied in practice. 

The observations are quantized using an adjustable quantizer 
whose output is given by 



X k = X, 



k-1 



w k , 



(1) 



where Wk is a sequence of independent Gaussian random vari- 
ables with its mean given by a small amplitude deterministic 
unknown sequence Uk and small known standard deviation 



(2) 



The initial condition Xq will be considered to be an unknown 
deterministic constant. 

The model expressed in ([T} is a compact form to describe 
three different evolution models for X k '. 

• Constant: by taking u k = <r w = 0, then Xk = Xq = x 
is an unknown deterministic constant. 

• Wiener process: if u k = 0, a w > and small , then 
Xk is a slowly varying Wiener process. This model is 
commonly used to describe a slowly varying parameter 
of a system when the model for its evolution is random 
but with unknown form. 

• Wiener process with drift: in this case Uk and a w are 
non zero and with small amplitudes. The fact that Uk is 
nonzero makes the Wiener process to have a drift, thus 
representing a model with a deterministic component that 
is perturbed by small random fluctuations. 

The process X is observed through Y and they are related 
as follows: 



Yk = X h 



(3) 



ik = Q 



Y k -b k 



(4) 



where i k is an integer defined on a finite set of N] integers, 
Ni being the number of quantization intervals. The quantizer 
parameters b k and are sequences of adjustable offsets and 
gains respectively. The function Q represents a static normal- 
ized quantizer and it is characterized by Ni + 1 thresholds. 
For simplification purposes some assumptions on the quantizer 
will be used. 

Assumptions (on the quantizer): 
A4. Nj will be considered to be an even natural number and 



ik e I = { - 



2 : 
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A5. It will be assumed that the static quantizer is symmetric 
and centered at zero. This means that the vector of 
thresholds^ 

1 T 

_n l ... t_i t Ti ... rw, 

2 2 . 

has elements given by the following expressions 



to 



0. 



Vt6 1 



2 



(5) 



These assumptions will be used later to simplify the choice 
of parameters of the algorithms. 

€ Wi-x,Ti), the adjustable quantizer output is 



p nr \Y k -b k \ 
1 ' A fc 



= i sign (Yfc - b k 



(6) 



where the noise V k is a sequence of additive independent and 
identically distributed (i.i.d.) r.v. which is also independent of 
Wk- The cumulative distribution function (CDF) of Vk will be 
denoted by F. Some assumptions on F are stated below. 

Assumptions (on the noise distribution): 
Al. F is locally Lipschitz continuous. 

A2. F admits a probability density function (PDF) / with 
respect to (w.r.t.) the standard Lebesgue measure on 

(R,B(R)). 



given by 

' Yk-h 
A fc 

A scheme representing the quantizer is given in Fig.[T] Note 
that even if the quantizer is not uniform (with constant distance 
between thresholds), it can be implemented using a uniform 
quantizer with a compander approach G). 

Based on the quantizer outputs the main objective is to 
estimate Xk and a secondary objective is to adjust the pa- 
rameters bk and Afc to enhance estimation performance. As 
the estimate Xk of Xk will be possibly used in real time 

1 Infinite thresholds are used to have the same notation for the probabilities 
of the granular and overload regions. 



3 



Static quantizer 



Adjustable gain 




Adjustable offset 





-1"2 




Tl 




- 




-n- 




-T2- 





Fig. 1. Scheme representing the adjustable quantizer. The offset and gain 
can be adjusted dynamically while the quantizer thresholds are fixed. 



Quantized 
measurements 




• when estimating a constant, the maximum likelihood esti- 
mator can be approximated by a simpler online algorithm 
using a stochastic gradient ascent algorithm, which has 
the same form as dTJ. It is shown in section IV that for 
the optimal choice of rji, |7]) is equivalent to a stochastic 
gradient ascent method to maximize the log-likelihood. 

• To estimate a Wiener process, a simple choice of estima- 
tor is a Kalman filter like method based on the quantized 
innovation, which is also (FT}. 

Due to the symmetry of the noise distribution, when Xk is 
close to Xk, it seems reasonable to suppose that the corrections 
given by the output quantizer levels have odd symmetry with 
positive values for positive i, this symmetry will be useful 
later for simplification purposes. Thus, one assumption will 
be added to A1-A5. 

Assumption (on the quantizer output levels): 
A6. The quantizer output levels have odd symmetry w.r.t. i: 



Estimate 



Fig. 2. Block representation of the estimation scheme. The estimation 
algorithm and the procedures to set the offset and the gain are represented by 
the Update block. 



applications, it might be estimated online, which means that 
Xk will only depend on past and present i}.. To simplify it 
will be considered that the offset is set to be Xk-i and that 
the gain is set to be a constant A. For the adaptive algorithm 
presented later, the fact that the offset is set to Xk-i will 
have, as a consequence, an asymptotic performance that does 
not depend on the mean of Xk, thus simplifying the analysis. 
The choice of A is discussed in section IV. 

The general scheme for the estimation of Xk is depicted in 
Fig. [2] and the main objective will be to find a low complexity 
algorithm that will be placed in the block named Update. 

III. General algorithm 

A simple and general form for the estimation algorithm that 
respects the constraints defined above (low complexity and 
online) is the following adaptive algorithm: 



Xk. — X. 



k-l 



Q 



Yk — Xk-i 
A 



(7) 



In the expression above, j). is a sequence of positive real 
gains and r/[-] is a mapping from / to K that is defined as a 
sequence of Nj coefficients ^r]_N ± , . . . , ?y_ i , 771 , . . . , 77 j, 
these coefficients are equivalent to the output quantization 
levels used in quantization theory. The use of this algorithm 
is also motivated by the following observations: 



Vi = ~V-i 



(8) 



with rji > for i > 0. 
The non differentiable non linearity in (|7]i makes it difficult 
to be analyzed. Fortunately, an analysis based on mean approx- 
imations was developed in [7] for a wide class of adaptive 
algorithms, within this framework, the function 77 could be 
a general non linear non differentiable function of Yf. and 
Xk and it was shown that the gains jk that optimizes the 
estimation of Xk should be as follows: 

• 7fe oc 4 when Xk is constant. 

• 7fc is constant for a Wiener process Xk. 
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• 7fc oc u| when Xk is a Wiener process with drift. 

In the following parts of this section the results of (7J will 
be applied for the analysis of (|7]) in the three evolution models 
of X k . 

A. Constant Xk 

In this case Xk = x. To obtain convergence of Xk to a 
constant, the gains must be: 



7k 



2 
k' 



(9) 



For large k, the mean trajectory of Xk can be approximated 
using the ordinary differential equation (ODE) method. The 
ODE method approximates the expectation of the estimator 



E 



Xk 



by x(tk), where x (t) is the solution of 

dx 



7/1 (x) 



(10) 



the correspondence between continuous and discrete time is 

fc 

given by tfc = i and h (x) is the following: 



h(x) =E 



v Q 



V 



A 



(11) 



where the expectation is evaluated w.r.t. F(v). 



For the solution of (10 1 to be valid as an approximation of 



E Xk , h (x) has to be a locally Lipschitz continuous function 
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of x. Using the assumptions on the quantizer thresholds and 



with 



output levels, the expectation in (Hi can be written as: 



'/ fa-iA + x-x) - f (nA + x -x) 



h(x) = ^2[rnF d (i,x,x)-r}iFd{-i,x,x)], (12) f d =< 



if ie {!,- 



A'/ 



( = 1 



where i 7 ^ is a difference of CDFs: 



F d 



'F (tjA + x - x) - F (r 4 _iA + x - x) 

if i € {1,- • 
F (t<+iA + x - x) - F (r<A + x - x) 

if ie {-!,- 



2 



(13) 

From assumption Al, the function h is a linear combination 
of locally Lipschitz continuous functions, which implies that 
h is also locally Lipschitz continuous, thus the ODE method 
can be applied. 

If x — > x when t — > oo for all x and all x (0), the 
adaptive algorithm is asymptotically unbiased, and in this case 
it can also be shown, using a central limit theorem, that the 
estimation error is asymptotically distributed as a Gaussian r.v. 
Q pp. 109]: 



Vk(x k -x) ^^(O.oi), 
where the asymptotic variance is given by: 

7 2 i?(x) 



2 



(14) 



(15) 



- 27/1^ (x) - 1 

The term denoted R in the numerator is the vari- 
ance of the adaptive algorithm normalized increments 
when x is equal to x. From A3 and A6, 



\ _ 7* J 



h (x) = when x = x and this variance can be written as 
the second order moment of the quantizer output levels: 



R(x) 



Var 



v Q 



x - x + V 
A 



= ^2{Vz F d(hx,x) +i] 2 i i F d (~i,x,x)) 



= 2^2v? F d(i,x,x) . 



(16) 



where the last equality comes from the symmetry assump- 
tions. 

The term in the denominator is the derivative of h when 
x is equal to x: 



h$ (x) 



dh 

dx 



(17) 



X x > x ) ~ ^ d x > x ^ ' 



/ (t,A + x - x) - / (r j+1 A + x - x) 

if ie {-!,... 



(18) 

From the symmetry assumptions, /<j (i, x, x) is odd w.r.t. 
i, thus fP7] > can be rewritten as 



^ (a;) = _2 X] x ' x ^ 

i=l 

Minimizing w.r.t. the positive gain 7 gives 

7* 1 



/i s (x) 
R(x) 



(19) 



(20) 



(21) 



When x = x, the functions F d (i,x,x) and fd(i,x,x) do 
not depend on x anymore, thus from now on they will be 
denoted F d [i] and [i]. The functions f2(x) and fix (x) do 
not depend on x either, thus they will be denoted by the 
constants R and /i^ respectively. 

To specify completely the adaptive algorithm, the quantizer 
parameters rji, r and A can be chosen to minimize pi) . 

B. Wiener process 

If -Xj. is a Wiener process, the mean of is = 
and the variance is a known constant Var [Wit] = cr\. The 
algorithm gain can be chosen to be a constant 7^ = 7. For 
small crj, the mean trajectory of Xk is also approximated by 
( fTO] ), x being the initial condition xo of the Wiener process, 
which is equal to its mean for every k. Thus, if x converges to 
x, the algorithm is asymptotically unbiased and, in this case, 
it can be shown that the asymptotic estimation MSE can be 
approximated in the following way ||7] pp. 130-131]: 



MSE^ = lim E 

k —J- 00 



Xu — Xi 



7E [6 



(22) 



The stochastic process £ t is the solution of a stochastic 
differential equation: 



d£t = h&£ t dt - ja w VRdZ t , 



(23) 



where Z t is a continuous time Wiener process with unit 
increment variance. Under the condition 



jh £ < 0, 



(24) 



£t is stationary with a marginal Gaussian density Af ^0, <r| 
where the variance is 



i 2 R + °l 
-27/14 



(25) 
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Thus, MSEqo can be approximated by er|. Minimizing MSE C 
w.r.t. 7 gives the optimal 7 



7 



R 



(26) 



which is a positive real, thus changing the condition ( 24 1 into 



h* < 0. 



The MSE for 7* is 



MSEoo = 



-hx 



Using (28 1 and (21 1 the MSE can be rewritten as 



MSE. 



a w a Q 



(27) 



(28) 



(29) 



Both the asymptotic MSE for estimating a Wiener process and 
the asymptotic variance for estimating a constant depend on 
the quantizer parameters through a^, therefore the optimal 
quantizer parameters will be the same in both cases. The only 
difference in the adaptive algorithms for these two cases is the 
sequence of gains 7^. 

C. Wiener process with drift 

In this case the mean of Wk is nonzero and given by a 
small amplitude sequence Uk, the variance is a constant <j w . 
The gain jk will be considered to be variable in time and 
under the assumption of asymptotic unbiasedness for constant 
Xk, the MSE can be approximated by the term due to the 
estimation bias which is given by [7, pp. 136]: 



MSEi 



E 



Xu — Xi 



IP 



Ik 



R 

2hf 



Minimization w.r.t. 7* leads to 



7fe 



MSEi 



-h x R 



(30) 



(31) 



(32) 



Note that in practice, Uk may be unknown and it will be 
necessary to replace its value in 7^ by an estimate of it 
Uk, which can be also obtained adaptively, for example by 
calculating a recursive mean on Xk — Xk-i- 

The MSE can also be rewritten as a function of cr^ with a 
dependence on Uk 



MSE* 



Uk 2 



(33) 



Also in this case the MSE is an increasing function of a^. 
From the three cases it is possible to see that the quantizer 
design will depend on the following: 

1) Asymptotic unbiasedness: it is necessary to prove asymp- 
totic unbiasedness of the algorithm when Xj. is constant 
for the MSE results given above to be valid. This can be 
done by proving the asymptotic global stability of the 



ODE ( 10 1 for an arbitrary Xk = x and Xq ~ x (0) in 



2) Minimization of cr^: the quantizer parameters can be 
chosen to minimize cr^ and, as a consequence, they 
will maximize the performance for the three evolution 
models of Xk- 

IV. Asymptotic unbiasedness and adaptive 

ALGORITHM DESIGN 

In this section, first it will be shown that the algorithm is 
asymptotically unbiased. Then, optimization of the algorithm 
asymptotic performance will be done by minimizing 
, which depends on r/i, A (x) and t. The optimal coefficients 
j]i will be found and then the choice for the parameters A and 
t will be discussed. 

A. Asymptotic unbiasedness 

For the asymptotic performance results to be valid, it is 
necessary to prove that the estimation procedure when Xk = x 
is asymptotically unbiased. For doing so, one needs to prove 



that the solution of (10 1 for any x (0) and x tends to x as 

t 



00. 



The approximation for the mean error can be written as 



e = x — x 

and the ODE for the mean error is 

dc 



dt 



7h(e), 



(34) 



(35) 



where h (e) = h (e + x) is a function that does not depend on 

x. 

It is necessary to prove that e — s- as t — >• 00 for 
every e (0) € M, which means that e = is a globally 
asymptotically stable point [8|. Global asymptotic stability of 
e = can be shown using an asymptotic stability theorem 
for nonlinear ODEs. This will require the definition of an 
unbounded Lyapunov function of the error. To simplify, a 
quadratic function will be used: 



£(e) 



(36) 



which is a positive definite function and tends to infinity when 
e tends to infinity. 

If 7/1 ( e ) = for e = and ^ < for e ^ then by the 
Barbashin-Krasovskii theorem (8] Ch. 4], e = is a globally 
asymptotically stable point. 



To show that both conditions are met, expression (12i can 
be rewritten using A6: 



h(e) = £ 77, [F d (i,e)-F d (-i,e) 



(37) 



where F4 (i, e) = Fd (i, e + x,x) is also a function that does 
not depend on x. 

When e — 0, the differences between F^ in the sum are 
differences between probabilities on symmetric intervals, the 
symmetry of the noise PDF stated in A3 and the symmetry of 
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the quantizer stated in A5 imply that h (0) = 0, fulfilling the 
first condition. 

The second condition can be written in more detail by using 
the chain rule for the derivative: 



dC dC de „ ~ , . 

* = &* =2e * (e > <0 ' fOT 



(38) 



As 7 > by definition, h (e) has to respect the following 
constraints: 



h (e) > 0, for e < 0, 
h (e) < 0, for e > 0. 



(39) 



When e ^ 0, the terms in the sum that gives h (e) are the 
difference between integrals of the noise PDF under the same 
interval size but with asymmetric interval centers. Using the 
symmetry assumptions, for e > 0, Fd (i, e) is the integration of 
/ over an interval more distant to zero than for Fd (—i, e), then 
by the decreasing assumption on /, Fd (i, e) < Fd (— i, e) and 
consequently h (e) < 0. Using the same reasoning for e < 
one can show that h (e) > 0. Therefore, the inequalities in 



439) are verified and j£ < for e ^ 0. 



Finally, as both conditions are satisfied one can say that 
e = is globally asymptotically stable, which means that 
the estimator is asymptotically unbiased and that all the 
performance results obtained are valid. 

Note that from A3 and A5, h± (x) < 0, thus the supple- 
mentary condition for stationarity ( |24| is also respected. 

B. Optimal quantizer parameters 

The performance of the adaptive algorithm can be maxi- 
mized by minimizing w.r.t. the quantizer levels rji. Using 



(16 1 and (19 1 in (21 1 gives the following minimization prob- 
lem: 



arg mm 



R 



. I r, T F dV 

arg mm < ft 

n l2[^f d ] 2 



v I h 

where r\ is a vector with the coefficients 



F<j is a diagonal matrix given by 



diag 



Fd[l] 



and fd is the following vector 



fd [I]'" fd 



Nr 



Nr 



(40) 



(41) 



(42) 



The minimization problem is equivalent to the following 
maximization problem: 



arg T 



(44) 



Using the fact that Fd is diagonal with non zero diagonal 
elements, |44} becomes 



arg max 



(45) 



the matrices Fd 3 and Fd 5 are obtained by taking the 
square root and the inverse of the square root of the diagonal 
elements in Fd- Using the Cauchy-Schwarz inequality on the 
expression in the numerator gives 



F d 5 T7j (Fd~5f d 

T 



< fd^d^fd 



and the equality happens for 

Fd'r/ oc F d _ 5f d . 
Therefore, the optimal rj can be chosen to be 

T]* = Fd^fd. 



(46) 



(47) 



(48) 



It is possible to see that the coefficients chosen in this way 
still depends on A and r. The minimum is 



2 i -foy^H 

°° 2ff d -F d - 1 fd) k F ^ 



(49) 



To simplify the choice of the constant A, it will be con- 
sidered that the noise CDF is parametrized by a known scale 
parameter S, which means that 



(50) 



where F n is the noise CDF for 5 = 1. Thus, the evaluation of 
the quantizer output levels can be simplified by setting: 



A = c A S. 



(51) 



Since the coefficients -q* do not depend on x anymore, for 
a given ca and noise CDF, they can be pre-calculated and 
stored in a table. For i > 0, these coefficients are given by 



fd\ 



F 



d\i 



(52) 



(43) Note that for A given by (51 1, rji depends on 5 only through 



multiplicative factor, the other factor can be written as a 
function of normalized PDFs and CDFs, thus this factor can be 
pre-calculated based only on the normalized distribution. Note 
also that the 77* are given by the score function for estimating 
a constant location parameter when considering that the offset 
is fixed and placed exactly at x, therefore this algorithm is 
equivalent to a gradient ascent technique to maximize the log- 
likelihood that iterates only one time per observation and sets 
the offset each time at the last estimate. 
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Using the r/i from (|52]i, the adaptive estimator can be written A. Constant X k 



as 



X k = X k -i +7 fe sign(i fe )?7| ifc |, 



(53) 



with i k = Q ( j k ~^ 

The sum in ( 49 1 is the Fisher information I q for estimating 



a constant x from the output of the adjustable quantizer with 
an offset exactly placed at x: 



/, 2 M 



this quantity can be maximized w.r.t. r, thus leading to the 
following optimization problem: 



arg max I q 



(55) 



Problem |55) without constraints on the thresholds seems to 
be very difficult to solve analytically and no simple solutions 
for this problem were found in the literature. Therefore, 



general solutions for ( 55 i will not be treated here, for the 
results that will be presented in section V it will be considered 
that the quantizer is uniform, with r defined as follows 



Replacing h& given by (57i in ( 20 » and the result in (|9j) 
gives the following gains: 



Ik 



1 

kl„ 



and by replacing (57i and (58 1 in (21 



cri, is obtained: 



(59) 



(60) 



In practice this means that for large k, the estimation 



(54) variance will be (cf. (14i) 



Var 



kl a 



(61) 



The right hand side of (61 



is the inverse of the Fisher 
information for estimating X k = x based on i k when the 
offset is fixed to be x. The inverse of the Fisher information is 
known as the Cramer-Rao bound and it is a lower bound on the 
variance of unbiased estimators (9] Ch. 3]. This means that for 
large k, the estimator has the lowest possible variance within 
the class of unbiased estimators using quantized observations 
with offset bk = x. 

In the continuous case (infinite number of quantization 
intervals) the CRB for k observations is given by 



Tl = 1 



T N, 



2 



1 TN, 



oo 



then in this case, only ca need to be set and consequently a 
grid method can be used. 

In the next section the results for each case using the choice 
of parameters obtained above will be detailed and discussed. 



V. Results and simulation 

It will be supposed that the noise CDF and S are known and 
also the type of evolution model for X k . Thus for a given Nj, 
eg and r, the coefficients r\i used in the estimation algorithm 



(53 i can be calculated using (52 1 



There are two quantities that still need to be determined, h% 



and R. Using (52 1 in (16i and (19i gives 



CRE> — 



kV 



(56) where I c is the Fisher information given by 



In = 



and /' (x) — . In the cases where I c exists and for large 
k, one can calculate the loss of estimation performance L q in 
decibels (dB) in the following way: 



fix) 



f {x) dx 



(62) 



(63) 



Var X k 



L q = -101og lc 



B. Wiener process 



CRB r 



= -10 log 



10 



(64) 



Using (p8jl in ( 26 1, the following constant gain is obtained: 



R = 2 



= Ia. 



(57) 



(58) 



The specific gain j k and the performance of the algorithm 
for each model will now be determined. 



7 = 



(65) 



and for this gain, the asymptotic MSE is obtained by substi- 



tuting (60 1 in (29 1: 



MSE, 



(66) 



The comparison with the continuous case can be done also 
using a lower bound on the variance. In this case as X k is 
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random the Bayesian Cramer-Rao bound (BCRB) can be used, 
this bound is defined as the inverse of the Bayesian information 
for time k JTUJ Ch. 1]: 



BCRB, = 



1 



(67) 



For a Wiener process, the Bayesian information can be cal- 
culated recursively. The recursive expression, given in its 
general form in pT| , for a scalar Wiener process observed 
with additive noise is 



Jk = h 



1 



1 



Jk- 



(68) 



The comparison must be done for k — > oo. After calculating 
the fixed point Jrx, of (|68j), the asymptotic BCRB obtained is 



algorithm using continuous observations. The algorithm has 
the following form: 



X k =X k _ 1 + 1 c k r lc (Y k -X k _^ , 



(75) 



where jf. and the non linearity r\ c (x) are optimized to mini- 
mize the MSE. 

Using the same theory described for the quantized case it 
is possible to show that the optimal 7^ and r\ c (x) are 



Ik 



Vc {x) 



f'(x) 
J¥)" 



(76) 



(77) 



which exist under the constraint that I c converges and is not 
zero and that /' (a;) exists for every x. 

The MSE can be approximated in a similar way as before: 



BCRBoo = 



(69) 



Expression (66 1 is only valid for small a w , in this case (69 1 



can be approximated by 



BCRB 



(70) 



and the loss in asymptotic performance for the estimation 
of the Wiener process can be approximated by a function of 



L. 



(71) 



C. Wiener process with drift 

The varying optimal gain and the MSE are obtained by 



replacing (57 1 and (58 1 in (31 1 and (32i: 



MSE, 



u k 



47, 



(78) 



Therefore, the loss in performance incurred by quantizing 
the observations in the estimation of the Wiener process with 
drift L^ D can be approximated by 



WD 



(79) 



The losses for the three models of X k depend directly on 
L q , thus L q allows to approximate how much of performance 
is lost for a specific type of noise and threshold set comparing 
to the optimal (possibly suboptimal in the case with drift) 
estimator based on continuous measurements. In the next 
subsection the loss will be evaluated for two different classes 
of noise considering that the quantization is uniform, then the 
adaptive algorithm will be simulated in the three cases and the 
simulated loss will be compared to the results given above to 
check their validity. 



7 fe = 



(72) 



MSE, 



u k 

4/„ 



As u k is unknown, it might be estimated. For slowly varying 
Uk it can be estimated by smoothing the differences between 
successive estimates: 



X k -X 



k-l 



-u, 



k-l 



(74) 



Then, Uk can replace Uk in the evaluation of the gain and the 
MSE. If more information about the evolution of Uk is known, 
it might be incorporated in ( |74| ) to have more precise estimates 
and get closer to the optimal adaptive gain. 

As it is hard to have a bound on performance for the esti- 
mation of a deterministic signal under non Gaussian noise, the 
comparison with the continuous observation case will be done 
using the approximate performance for a nonlinear adaptive 



D. Simulation 

The thresholds are considered to be uniform and given by 
(56 1. For a given type of noise, supposing that 6 is known 



(73) and for fixed Nj, I q can be evaluated by replacing (56 1 and 



(51 



As Iq is now a function 



in the expressions for fd and Fd 
of ca only, it can be maximized by adjusting this parameter. 
Being a scalar maximization problem this can be done by 
using grid optimization (searching for the maximum in a fine 
grid of possible ca). After finding the optimal ca and I q , 
the coefficients rji, the optimal gains 7^ and the quantizer 
input gain ^ can be evaluated and then all the parameters 
are defined. 

Note that it is supposed that the model for Xk is known as 
setting 7fc depends on it. As a consequence of this assumption, 
in a real application the choice between the three models must 
be clear. When this choice is not clear from the application, it 
is always simpler to choose X k to be a Wiener process, first, 
because the complexity of the algorithm is lower and second, 
because supposing that the increments are Gaussian and i.i.d. 
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does not impose too much information on the evolution of 
Xk- Still, a w must be known, in practice it can be set based 
on prior knowledge on the possible variation of or by 
accepting a slower convergence and a small loss of asymptotic 
performance, it can be estimated jointly with Xk using an extra 
adaptive estimator for it. In the last case, when it is known that 
the increments of Xk have a deterministic component, the fact 
the 7^ depends on Uk is not very useful and prior information 
on the variations of Xk are not normally as detailed as 
knowing Uk itself, making it necessary to accept a small loss of 
performance to estimate Uk jointly. The estimation of Uk can 
be done using ( f74] i where prior knowledge on the variations 
of Uk can be integrated in the gain 7^. If precise knowledge 
on the evolution of Uf. is known through dynamical models, 
then it might be more useful to use other forms of adaptive 
estimators known as multi-step algorithms J7] Ch. 4]. 

The evaluation of the loss and the verification of the results 
will be done considering two different classes of noise that 
verify assumptions Al to A3, namely, generalized Gaussian 
(GG) noise and Student's-t (ST) noise. The motivation for 
the use of these two densities comes from signal processing, 
statistics and information theory. 

In signal processing, when additive noise is not constrained 
to be Gaussian a common assumption is that the noise follows 
a GG distribution fT2) . This distribution not only contains the 
Gaussian case as an specific example, but also by changing 
one of its parameters, one can represent from the impulsive 
Laplacian case to distributions close to the uniform case. 
In robust statistics, when the additive noise is considered 
to be impulsive, a general class for the distribution of the 
noise is the ST distribution fT3~) . ST distribution includes as 
a specific case the Cauchy distribution, known to be heavy 
tailed and thus normally used in robust statistics, also by 
changing a parameter of the distribution an entire class of 
heavy tailed distributions can be represented. When looking 
from an information point of view, if no priors on the noise 
distributions are given, noise models must be as random as 
possible to ensure that the noise is an uninformative part of the 
observation, thus noise models must maximize some criterium 
of randomness. Commonly used criteria for randomness are 
entropy measures and both distributions considered above are 
entropy maximizers. GG distributions maximize the Shannon 
entropy under constraints on the moments p"4"} Ch. 12] and ST 
distributions maximize the Renyi entropy under constraints on 
the second order moment (15). 

Both distributions are parametrized by a shape parameter 
(3 e R + and their PDFs and CDFs for 5 = 1 are 



fGG (x) 



F G g{x) = 



2r u 



1 + sign (x) 



7 



r (j) 



(80) 



(81) 



for the GG distribution, where 7 (■, ■) is the incomplete gamma 
function and Y (•) is the gamma function, 
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Fig. 3. Loss of performance due to quantization of measurements for different 
types of noise and number of quantization bits. 



/ST (x) 

F ST (x) 



1 



1 

2 

sign (x) 



1 



I e 

.r-' + rf 



a 1 
2' 2 



(82) 
,(83) 



(3 (•,•) is the incomplete 



for the ST distribution, where / 
beta function. 

1 ) Performance loss - L q : The first quantity to be evaluated 
will be the loss L q . To evaluate L q , after evaluating I q based 
on / and F defined above, it is also needed to evaluate I c . 



Evaluating the integral on ( 63 I, one obtains for the GG and 
ST distributions respectively: 



'GG 



(x) = 



/?03-i)r(i-i) 



(*) 



1st (x) 



(3 + 1 



(84) 



(85) 



The loss was evaluated for Ni = {2,4,8,16,32} which 
corresponds to Nb = log 2 (Nj) = {1,2,3,4,5} number of 
bits and for the shape parameters f3 = {1.5,2,2.5,3} for GG 
noise and f3 = {1,2,3} for ST noise. The results are shown 
in Fig. [3] As it was expected, the loss reduces with increasing 
Nb- It is interesting to note that the maximum loss, observed 
for Nb = 1, goes from approximately ldB to 4dB, which 
represents factors less than 3 in MSE increase for estimating a 
constant with 1 bit quantization. Also interesting is the fact that 
the loss decreases rapidly with Nb, for 2 bits quantization all 
the tested types of noise produce losses below ldB, resulting 
in linear increases in MSE not larger than 1.3. This indicates 
that when using the adaptive estimators developed here, it is 
not very useful to use more than 4 or 5 bits for quantization. 

The performance for 2 bits seems to be related to the noise 
tail, note that smaller losses were obtained for distributions 
with heavier tail (ST distributions and GG distribution with 
(3 = 1.5), this is due to the fact that for large tail distributions 
a small region around the median of the distribution is very 
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Fig. 4. Constant. Quantization loss of performance for GG and ST noises and Ng = {2, 3, 4, 5} when X/, is constant. For each type of noise there are 4 
curves, the constant losses are the theoretical results and the decreasing losses are the simulated results, thus producing pairs of curves of the same type, for 
each pair the higher results represent lower number of quantization bits. In (a) results for GG noise and Ng = 2 and 3, in (b) the results for GG noise and 
Ng = 4 and 5 are shown. The figures (c) and (d) are the results for ST noise, in (c) Ng = 2 and 3 are considered while in (d) Ng = 4 and 5. 



informative, thus as most of the information is contained there, 
when the only threshold available is placed there, the relative 
gain of information is greater than in the other cases, leading 
to smaller losses. This can also be the reason for the slow 
decrease of the loss for these distributions, as the quantizer 
thresholds are placed uniformly, some of them will be placed 
in the non informative amplitude region and consequently the 
decrease in loss will be not as sharp as in the other cases. 

Laplacian distribution was not tested, because for this 
distribution the optimal adaptive estimator in the continuous 
case is already an adaptive estimator with a binary quantizer. 
This can be seen easily if one evaluates I q as a function of the 
thresholds, the result will be a constant for all possible sets 
of thresholds meaning that they are unimportant, moreover, if 
rji are evaluated one will find that they are all equal, therefore 
only the sign of the difference between the observations and 
the last estimate is important. Consequently, the loss found in 
this case would be a constant for all Nb- 

To validate the results, the adaptive algorithms will be 
simulated and the loss obtained will be compared to the 
approximations given above. The simulation results will be 
presented in the same order as before, first the constant case, 
then the Wiener process case and finally the case with drift. 
All the simulation were done considering Ng = {2,3,4,5}. 

2) Simulated loss - Constant: in the constant case, the 7 
types of noise with evaluated L q were tested, the value of 
Xq = x was set to be zero and the initial condition of the 
adaptive algorithm was set with a small error (X\ <G {0, 10}), 
the number of samples was set to be 5000 to have sufficient 
points for convergence, the algorithm was simulated 2.5 x 
10 6 times and the error results were averaged to produce a 



simulated MSE. Based on the simulated MSE a simulated loss 
was calculated, and it is shown in Fig. [4] 

The simulated results seems to converge to the theoretical 
approximations of L q , thus validating these approximations. 
This also means that the variance of estimation tends in 
simulation to the CRB for quantized observations, validating 
the fact that the algorithm is asymptotically optimal. The 
convergence time looks to be related to Ng, when Nb 
increases the time to get closer to the optimal performance 
decreases. 

3) Simulated loss -Wiener process: for a Wiener process, 
L q was evaluated by setting X (0) randomly around and 
X = 0, then 10 4 realizations with 10 5 samples were simu- 
lated and the MSE was estimated by averaging the realizations 
of the squared error for each instant, then as it was observed 
that the error was approximately stationary after k = 1000, 
the sample mean squared error was also averaged resulting in 
an estimate of the asymptotic MSE. Based on the obtained 
values of the MSE a simulated loss was evaluated. The results 
for the 7 types of noise and a w = 0.001 are shown in Fig. [5] 

As expected, the results have the same form of the theoret- 
ical loss given in Fig. [3] To verify the results for different a w , 
the loss was evaluated through simulation also for a w =0.1 
in the Gaussian (GG with = 2) and Cauchy cases (ST with 
= 1). The results are shown in Fig. [6] where the theoretical 
losses for these cases are also shown. It is clear from the results 
that Xk might move slowly to give a performance close to the 
theoretical results, but it is also interesting that the simulated 
loss seems to have the same decreasing rate as a function of 
Nb when compared to the theoretical results. This means that 
the dependence on I q of the MSE seems to still be correct and 
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Fig. 5. Wiener process. Simulated quantization performance loss for a 
Wiener process with a m = 0.001, different types of noise and number 
of quantization bits. 
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Fig. 7. Wiener process with drift. Comparison of simulated and theoretical 
losses in the Gaussian and Cauchy noise cases for estimating a Wiener process 
with constant mean drift ui. = 10 — 4 and standard deviation cr w = 10 -4 . 
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Fig. 6. Wiener process. Comparison of simulated and theoretical losses in 
the Gaussian and Cauchy noise cases when estimating a wiener process with 
a w = 0.1 or cr w = 0.001. 



it indicates that even in a faster regime for Xk, the thresholds 
can be set by maximizing I q . 

4) Simulated loss - Wiener process with drift: for Xk with 
drift, Wk was simulated with mean and standard deviations 
Uk = <?w = 10~ 4 , which represents a slow linear drift with 
small random fluctuations, the initial conditions were set to 
be Xq = X = and the drift estimator was set with constant 
gain 7k = 10~ 5 . Its initial condition was set to the true Uk 
to reduce the transient time and consequently the simulation 
time. As Uk is constant, the loss evaluation was done in the 
same form as for Xk without drift, based on averaging through 
realizations and time. The results for the Gaussian and Cauchy 
cases are shown in Fig. [7J 

The small offset between simulated and theoretical results is 
produced by the joint estimation of Uk- Note that keeping 7^ 
to a small constant allows to adaptively follow slow variations 
in Uk - The convergence to the simulated loss in Fig. [7] was also 
obtained for simulations with errors in the initial conditions 
but in this case the transient regime was very long, indicating 
that other schemes might be considered when the theoretical 
performance is needed in a short period of time. Multi-step 
adaptive algorithms could be used for faster convergence to the 
theoretical performance but they would need a precise model 



for the evolution of the drift which is not considered here. 

VI. Conclusions 

In this work an adaptive estimation algorithm based on 
quantized observations was proposed. Based on observations 
with additive noise and quantized with adjustable offset and 
gain, the objective was to estimate with a low complexity 
online adaptive algorithm a scalar parameter that could follow 
one of three models, constant, Wiener process and Wiener 
process with drift. Under the hypothesis that the noise PDF 
is symmetric and strictly decreasing, and that quantizer is 
also symmetric, by using Lyapunov theory it was shown that 
for the optimal quantizer output coefficients, the algorithm is 
asymptotically stable. It was also shown that the asymptotic 
performance in terms of mean squared error could be opti- 
mized by using static update coefficients that depend only 
on the shape of the observation noise and on the quantizer 
thresholds. 

Performance results were obtained based on the optimal 
choice of the quantizer output levels. It was observed that 
the effect of quantization on performance could be quanti- 
fied by the Fisher information of the quantized observations. 
Thus, this clearly indicates that the quantizer thresholds must 
be placed to maximize the Fisher information. It was also 
observed that for the three models, the loss of performance 
of the algorithm w.r.t. the optimal continuous measurement is 
given by a function of the ratio of the corresponding Fisher 
informations. 

For testing the results, two different families of noise were 
considered, generalized Gaussian noise and Student's-t noise, 
both under uniform quantization. First, the theoretical loss was 
evaluated for different numbers of quantization intervals. The 
results indicate that with only a few quantization bits (4 and 
5) the adaptive algorithm performance is very close to the 
continuous observation case and it was observed that uniform 
quantization seems to penalize more estimation performance 
under heavy tailed distributions. 

Estimation in the three possible scenarios was simulated and 
the results validated the accuracy of the theoretical approxima- 
tions. In the constant case it was observed that the algorithm 
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performance was very close to the Cramer-Rao bound, in the 
Wiener process case it was observed that the theoretical results 
are very accurate for small increments of the Wiener process 
and in the drift case it was seen that by accepting a small 
increase in the mean squared error it is possible to estimate 
jointly the drift. 

Another interesting result is that a varying parameter has a 
loss of performance smaller than a constant parameter, thus a 
type of dithering effect seems to be present. In this case, the 
variations of the input signal makes the tracking performance 
of the estimator to get close to the continuous measurement 
performance. 

The fact that the number of quantization bits does not 
influence much the performance of estimation leads to con- 
clude that it seems more reasonable to focus on using more 
sensors than using high resolution quantizers for increasing 
performance. Consequently, this motivates the use of sensor 
network approaches. 

As the Fisher information for quantized measurements plays 
a central role in the performance of the algorithms, the study 
of its properties as a function of the noise type and quantizer 
thresholds seems to be a subject for future work. A possible 
approach for the study of its general behavior would be to 
consider high resolution approximations. 

Finally, as in practice sensor noise scale parameter and 
Wiener process increment standard deviation can be unknown 
and slowly variable, it would be also interesting to study 
how the algorithm design and performance would change by 
estimating all these parameters jointly. 
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